Goto

Collaborating Authors

 imagen video


Why text-to-video may be the next 'big' AI thing - Times of India

#artificialintelligence

Generative Artificial Intelligence (AI) is expanding beyond text-to-image models, with the emergence of text-to-video. Runway's Gen-2 model and Google's Imagen Video and Phenaki models create videos based on text prompts, but the challenge lies in achieving precision and avoiding fake or misleading videos. Ethical challenges also arise, as AI-generated videos could be used for deception via the creation of deepfakes. However, with Big Tech already involved in the development of text-to-video models, it may not be long before this technology becomes mainstream. When it comes to generative AI, there's only one thing dominating the headlines -- ChatGPT.


AI and you: The good, the bad and the ugly

#artificialintelligence

Machine learning has come a long way since computer scientists began taking an interest in programming a computer to play chess in the 1940s. It was only in 1997 that IBM's Deep Blue supercomputer became the first machine to beat then-reigning world chess champion Gary Kasparov. Since then researchers have been finding ways to make artificial intelligence (AI) more sophisticated and smarter, which prompts the question: Are humans at risk of being replaced by AI? The future of "static" chatbots, the kind that everyone finds annoying because it gives a set of templated answers, may be a thing of the past, as researchers at OpenAI have trained a model dubbed ChatGPT to interact in a conversational manner. The AI research and deployment company claims that the dialogue-based AI chatbot can provide lengthy answers to various questions, write a song on any topic (try eggs), create slogans and even help to debug programs.


Deepfakes could get super advanced (and weird) thanks to these breakthroughs

#artificialintelligence

If you picture a cape-clad dog soaring through the clouds or an astronaut riding a horse on Mars, you may think you're experiencing a fever dream. But these surreal images exist outside of a sleepy daze: You can pull them up on your computer right now. They were created by Meta's top-of-the-line algorithms that can turn any text into a (somewhat) realistic video. Last month, Meta used these surreal clips to introduce its Make-A-Video AI text-to-video generator to the world. Just days later, Google showed off not one but two AI video generators: Imagen Video and Phenaki. These models were designed to transform text descriptions into short video clips.


La veille de la cybersécurité

#artificialintelligence

Not to be outdone by Meta's Make-A-Video, Google today detailed its work on Imagen Video, an AI system that can generate video clips given a text prompt (e.g. While the results aren't perfect -- the looping clips the system generates tend to have artifacts and noise -- Google claims that Imagen Video is a step toward a system with a "high degree of controllability" and world knowledge, including the ability to generate footage in a range of artistic styles. As my colleague Devin Coldewey noted in his piece about Make-A-Video, text-to-video systems aren't new. Earlier this year, a group of researchers from Tsinghua University and the Beijing Academy of Artificial Intelligence released CogVideo, which can translate text into reasonably high-fidelity short clips. But Imagen Video appears to be a significant leap over the previous state-of-the-art, showing an aptitude for animating captions that existing systems would have trouble understanding. "It's definitely an improvement," Matthew Guzdial, an assistant professor at the University of Alberta studying AI and machine learning, told TechCrunch via email.


Google answers Meta's video-generating AI with its own, dubbed Imagen Video

#artificialintelligence

Not to be outdone by Meta's Make-A-Video, Google today detailed its work on Imagen Video, an AI system that can generate video clips given a text prompt (e.g., "a teddy bear washing dishes"). While the results aren't perfect -- the looping clips the system generates tend to have artifacts and noise -- Google claims that Imagen Video is a step toward a system with a "high degree of controllability" and world knowledge, including the ability to generate footage in a range of artistic styles. As my colleague Devin Coldewey noted in his piece about Make-A-Video, text-to-video systems aren't new. Earlier this year, a group of researchers from Tsinghua University and the Beijing Academy of Artificial Intelligence released CogVideo, which can translate text into reasonably-high-fidelity short clips. But Imagen Video appears to be a significant leap over the previous state-of-the-art, showing an aptitude for animating captions that existing systems would have trouble understanding.


Imagen Video: High Definition Video Generation with Diffusion Models

Ho, Jonathan, Chan, William, Saharia, Chitwan, Whang, Jay, Gao, Ruiqi, Gritsenko, Alexey, Kingma, Diederik P., Poole, Ben, Norouzi, Mohammad, Fleet, David J., Salimans, Tim

arXiv.org Artificial Intelligence

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting. Finally, we apply progressive distillation to our video models with classifier-free guidance for fast, high quality sampling. We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding. See https://imagen.research.google/video/ for samples.